Pseudotime Analysis: Tracing Cell Development Over Time¶
This notebook analyzes how cells change and develop over time using pseudotime analysis.
What is Pseudotime?¶
Pseudotime is a computational method that orders cells along a continuous timeline based on their gene expression patterns, even when we only have snapshots of cells at different time points.
What We'll Do:¶
- Run two different pseudotime methods:
- DPT (Diffusion Pseudotime): Uses diffusion maps to trace cell development
- Palantir: A more sophisticated method that can handle multiple cell fates
- Compare the results - See how well both methods agree on cell development timing
Preparing the Data for Analysis¶
Before we can analyze pseudotime, we need to clean and prepare our data:
- Normalize: Make sure all cells have similar total gene expression levels
- Log transform: Reduce the impact of very highly expressed genes
- Find variable genes: Identify genes that change the most between cells (these are most informative)
- Scale: Standardize gene expression values
- Reduce dimensions: Use PCA to focus on the most important patterns
- Build neighborhood graph: Find which cells are similar to each other
Let's first see what the actual time points are and the cell types.
For pseudotime analysis, we need to define:
- Start cell: Where development begins (a cell at the beginning of the experiment - 0hr)
- Terminal states: Where development ends (like differentiated cell types)
In our case:
- We start with an undifferentiated cell
- We end with two possible fates: prespore cells and prestalk cells
Method 1 - DPT (Diffusion Pseudotime)¶
DPT works like this:
Diffusion map: Model each cell as a node in a low‑dimensional diffusion space, where edges (short “bridges”) are weighted by transcriptomic similarity—closely related cells share shorter, stronger connections.
Pseudotime calculation: From a chosen start cell, accumulate diffusion (geodesic) distances along the graph to gauge how far every other cell lies along the developmental manifold.
This gives us a simple timeline of development - early cells have low pseudotime, late cells have high pseudotime.
Method 2 - Palantir Analysis¶
Palantir is a more sophisticated method that can handle cells developing into multiple different fates (like our prespore and prestalk cells).
Palantir works in several steps:
- Diffusion maps: Build a detailed map of cell relationships
- Multiscale space: Look at patterns at different scales of resolution
- Trajectory calculation: Calculate the probability that each cell will become each final cell type
Sampling and flocking waypoints... Time for determining waypoints: 0.06771524349848429 minutes Determining pseudotime... Shortest path distances using 30-nearest neighbor graph... Time for shortest paths: 0.3673614501953125 minutes Iteratively refining the pseudotime... Correlation at iteration 1: 0.9999 Entropy and branch probabilities... Markov chain construction... Computing fundamental matrix and absorption probabilities... Project results to all cells...
Visualizing Palantir Results¶
Palantir creates several useful plots:
- Pseudotime: How far along development each cell is
- Branch probabilities: How likely each cell is to become prespore vs prestalk
- Entropy: How "decided" each cell is (low entropy = committed to one fate)
We can identify cells that are clearly committed to specific developmental paths.
Finally, we can draw arrows showing the predicted developmental paths from our starting cell to the final cell types.
<Axes: title={'center': 'palantir_pseudotime'}, xlabel='UMAP1', ylabel='UMAP2'>
Comparing Both Methods¶
Now let's compare how well DPT and Palantir agree with each other and with our known information:
- Experimental time: What we actually measured
- DPT pseudotime: What the simpler method predicted
- Palantir pseudotime: What the more sophisticated method predicted
Both methods give similar results.
Comparison by Cell Type¶
Let's create violin plots to see how pseudotime values are distributed within each cell type, using the classification of cell types from the previous analysis.
Analysis of pseudotime using UCE embeddings¶
Now, instead of preprocessed counts, let's use the UCE embeddings.
Sampling and flocking waypoints... Time for determining waypoints: 0.08667874733606974 minutes Determining pseudotime... Shortest path distances using 30-nearest neighbor graph... Time for shortest paths: 0.4221353809038798 minutes Iteratively refining the pseudotime... Correlation at iteration 1: 0.9998 Correlation at iteration 2: 1.0000 Entropy and branch probabilities... Markov chain construction... Computing fundamental matrix and absorption probabilities... Project results to all cells...
Visualizing Palantir Results¶
Palantir creates several useful plots:
- Pseudotime: How far along development each cell is
- Branch probabilities: How likely each cell is to become prespore vs prestalk
- Entropy: How "decided" each cell is (low entropy = committed to one fate)
We can identify cells that are clearly committed to specific developmental paths.
Finally, we can draw arrows showing the predicted developmental paths from our starting cell to the final cell types.
<Axes: title={'center': 'palantir_pseudotime'}, xlabel='UMAP1', ylabel='UMAP2'>
Comparing Both Methods¶
Now let's compare how well DPT and Palantir agree with each other and with our known information:
- Experimental time: What we actually measured
- DPT pseudotime: What the simpler method predicted
- Palantir pseudotime: What the more sophisticated method predicted
Comparison by Cell Type¶
Let's create violin plots to see how pseudotime values are distributed within each cell type, using the classification of cell types from the previous analysis.
/opt/anaconda3/envs/scRNA_env2/lib/python3.12/pty.py:95: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock. pid, fd = os.forkpty()